Evaluation of the Stretch S6 Hybrid Reconfigurable Embedded CPU Architecture for Power-Efficient Scientific Computing
نویسندگان
چکیده
Embedded CPUs typically use much less power than desktop or server CPUs but provide limited or no support for floating-point arithmetic. Hybrid reconfigurable CPUs combine fixed and reconfigurable computing fabrics to balance better execution performance and power consumption. We show how a Stretch S6 hybrid reconfigurable CPU (S6) can be extended to natively support double precision floating-point arithmetic. For lower precision number formats, multiple parallel arithmetic units can be implemented. We evaluate if the superlinear performance improvement of floating-point multiplication on reconfigurable fabrics can be exploited in the framework of a hybrid reconfigurable CPU. We provide an in-depth investigation of data paths to and from the S6 reconfigurable fabric and present peak and sustained throughput as a function of wide registers used and total operand size. We demonstrate the effect of the given interface when using a floating-point fused multiply-accumulate (FMA) SIMD unit to accelerate the LINPACK benchmark. We identify a mismatch between the size of the S6’s reconfigurable fabric and the available interface bandwidth as the major bottleneck limiting performance which makes it a poor choice for scientific workloads relying on native support for floating-point arithmetic.
منابع مشابه
System-level performance evaluation of reconfigurable processors
Reconfigurable architectures that tightly integrate a standard CPU core with a field-programmable hardware structure have recently been receiving increased attention. The design of such a hybrid reconfigurable processor involves a multitude of design decisions regarding the field-programmable structure as well as its system integration with the CPU core. Determining the impact of these design d...
متن کاملEvaluation of Heterogeneous Multicore Architecture with AAC-LC Stereo Encoding
This paper describes a heterogeneous multi-core processor (HMCP) architecture which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption for SoCs of embedded systems. Memory architecture of CPUs and ACCs were unified to improve programming and compiling efficiency. For preliminary evaluation of the HMCP architecture, AAC...
متن کاملA Run-Time Partitioning Algorithm for RTOS on Reconfigurable Hardware
In today’s system design, reconfigurable computing plays more and more an important role. By the extension of reconfigurable devices like FPGAs with one or more CPUs new challenges in system design should be solved. These new hybrid FPGAs (e.g. Virtex-II Pro), provides a hardcore general-purpose processor (GPP) embedded into a field of programmable gate arrays. Furthermore, they offer partial r...
متن کاملImplementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey
Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...
متن کاملASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism
Emerging integrated CPU + FPGA hybrid platforms, such as the Extensible Processing Platform architecture from Xilinx [1], offer unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memory-intensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism rema...
متن کامل